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I. INTRODUCTION seasonality. The easiest way to manage the risk of solar 
power and harness this power is to forecast the amount of 
power to be generated [15] as well as the consumption. A 
reliable forecast is key for various smart grid applications 
such as dispatch, active demand response, grid regulation 


In recent decades, the world population is increasing 
rapidly and, due to this increase, the global energy 
demanded and consumed is also growing more and more 
[6]. Concerning , residential or commercial buildings, are 


identified as major energy consumers worldwide, and smart energy management [12]. 


accounting for about 30% of the global electricity demand The energy consumption of a building and the PV 
related to energy consumption in the residential sector [7]. generation can be represented by a time series with trends 
Buildings are responsible for a significant share of energy and seasonality [14]. There are numerous prediction 
waste as well. Energy waste and climate change represent studies on time series, from classical linear regressions to 
a challenge for sustainability, and it is crucial to make more recent works using machine learning algorithms, 
buildings more efficient [11]. Therefore, the development which are powerful tools in predicting electricity 
and use of clean products and renewable energy in consumption and PV generation [21]. Recently, many PV 
buildings have gained wide interest [6]. In the residential power forecasting techniques have been developed, but 
and commercial sectors, photovoltaic (PV) systems are the there is still no complete unit versal forecasting model and 
most common distributed generation, minimizing demand methodology to ensure the accuracy of predictions. 
dependence on traditional power plants and maximizing Concerning this, Artificial Neural Networks (ANNs) are 
household self-sufficiency [8]. very popular machine learning algorithms for object 


prediction and classification and are based on the classical 


Due to PV's dependence on weather conditions, 
feed-forward neural network approach [23]. ANNs are 


the intermittent nature of the power generated brings some 
uncertainty [24]. Similarly, the electricity consumption of 
these buildings also has inherent uncertainties due to 


computing systems inspired by the biological neural 
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networks of the brain, how neurons work, pass and store 
information [13; 24]. 


Due to the accelerated development of computing 
technology, ANN has provided a powerful framework for 
supervised learning [5]. Deep learning allows models 
composed of multiple layers to learn data representations 
[11]. Deep Neural Networks (DNN! ) are inspired by the 
structure of mammalian visual systems and they are also 
an important machine learning tool that has been widely 
used in many fields [25]. DNN employs an architecture of 
multiple layers of neurons in an ANN and can represent 
functions with higher complexity [5]. 


This work aimed at predicting the electricity 
consumption of a commercial building using ANN in its 
various architectures. Several ANN architectures were 
used and tested and a hybrid architecture (Dense, 
Convolutional and Recurrent), originally described by Lai, 
G. et al. [4] and adapted for this case study, was selected. 


Il. FOUNDATION 
2.1 Time Series 


Time series are sets of observations ordered in time [14]. 
A temporal series can be defined as a class of phenomena 
whose observational process and consequent numerical 
quantification generate a sequence of observations 
distributed over time. 


Electricity consumption histories over time are 
univalued time series [20] with trends, cycles, seasonality 
and randomness. Trends are long-term characteristics 
related to a time interval. Cycles are long-term oscillations, 


1 DNN - Deep Neural Network 
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more or less regular, around a trend line or curve. 
Seasonalities are regular patterns observed from time to 
time. Finally, randomness is effects that occur randomly 
and that cannot be captured by cycles, trends and 
seasonalities. 


Thus, the time series prediction models most used 
in the literature are those of linear and polynomial 
regressions. Among the regression models, we can 
mention the SARIMAX method [19]. This statistical 
model is a variant of the autoregressive moving average 
model (ARMA), adding derivations to make the model 
stationary (I), adding seasonality (S) and finally adding the 
effect of eXogenous (X) or random variables over time. In 
this work, the SARIMAX model was used as a baseline to 
compare its results, its application to the test case and the 
results obtained from other prediction models. 


2.2 Convolutional Artificial Neural Networks 


Convective Artificial Neural Networks (CNN? ) are a type 
of DNN that is commonly applied to analyse images. One 
of the main attributes of CNN is to drive different 
processing layers that generate an effective representation 
of the features of image edges. The architecture of CNN 
allows multiple layers of these processing units to be 
stacked, this deep learning model can emphasize the 
relevance of features at different scales [24]. 


Fig. 1 demonstrates a typical architecture of a 
CNN, composed of at least, a convolution layer, a pooling 
layer, a flattening layer and dense layers. 


2 CNN - Convolutional Neural Network 


Fig. 1. Basic CNN. 


Source: The author 


In the convolution layer, a filter (kernel, which is 
also a matrix) is applied to the input matrix aiming at its 
reduction while maintaining its most important 
characteristics. Fig. 2 represents, step by step, the 
application of the convolution function where g(x,y) 
represents the element of the convolution matrix, that is 
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the matrix product of the matrix colored in Fig. 2 by the 
kernel, at each step it shifts one position to the right until 
the last column of the input matrix after it shifts one line 
down and continues the process until it runs through the 
whole input matrix. In the example of Fig. 2, a 7X7 input 
matrix was reduced to a 5X5 convolution matrix. The 
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whole process represented in Fig. 2 is repeated for each of 
the kernels used, generating several convolution matrices. 


g(%y)=o@f(xy)= > Re (dx,dy) f (x+dx, y +dy) 
dx=¢a dy=@b 
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For the pooling layer, it is usual to apply the 
activation function relu f (x= max £0 , X) for example, 


generating a new reduced matrix as shown in Fig. 3. 


Finally, the flattening layer is nothing more than 
transforming the matrices of the pooling layers into 
vectors, which will be the inputs of the dense layer. 
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Fig. 2: Convolution process 


Source: The author. 


gau 


== 


= TH =)... = 


BIBI E ™ 


Matriz Convolucionada 


(mapa de características) 


função retu fG) = max (0.1) função retu f(x) = max (0.x) 


DADDA 
= 
PPPEE, 

BEBE z 


Matriz Convolucionada 


(mapa de caracteristicas) 


Fig. 3. Pooling Process. 


Source the Author. 


2.3 Recurrent Artificial Neural Networks 


In traditional ANNs, the inputs (and outputs) are 
independent of each other, making it difficult to use them, 
for example, in natural language processing where a word 
in a sentence depends on previous words in the same 
sentence, or in time series where we need to know the 
values over time for better projections. 


In contrast, recurrent artificial neural networks 
(RNN°) [8] store their previous state and also use it as 
input to the current state for calculations of new outputs. 
Another way of thinking about RNNs is that they have a 
"memory" that captures information about what has been 


3 RNN - Recurrent Neural Network 
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calculated so far. In theory, RNNs can make use of 
information in arbitrarily long sequences, but in practice, 
they are limited to looking back only a few steps. Fig. 4 is 
a typical representation of an RNN. 


Fig. 4: Basic RNN. 
Source: The Author. 
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Fig. 4 shows an RNN being expanded into a 
complete network. Where + is the input in time step t. For 
example, * could be a one-hot vector corresponding to the 
second word of a sentence, 5 is the hidden state in the 


time step t. It is the "memory" of the network. & Is 
calculated based on the previous hidden state and the input 


s= f(U x+ Ws,,) 


in the current time step: . The function Í 


anh 


is usually a nonlinearity, such as t or relu. S-1, which 


is needed to compute the first hidden state, is usually 


initialized with zeros. 9; is the output in step t. For 
example, if we wanted to predict the next word in a 
sentence, it would be a probability vector in our 


vocabulary. o= softrrax(V s). 


By expanding, we simply 
mean that we write the network for the complete sequence. 
For example, if the sequence we are interested in is a 5- 
word sentence, the network would be unfolded into a 5- 


layer neural network, one layer for each word. 


HMI. MATERIAL and METHODS 


This work was carried out at the Research Group on 
Intelligent Engineering and Computing for Advanced 
Innovation and Development (GECAD%), a research centre 
located at the Instituto Superior de Engenharia do Porto of 
the Instituto Politécnico do Porto ISEP/IPP, Porto, 
Portugal. Similarly to the HyFIS2 model (Josi et al.; 2016), 
the posited model uses the actual electrical consumption 


http://www.gecad.isep.ipp.pt/GECAD/Pages/Pubs/Publ 
icationsPES.aspx 


Série Temporal 


Camada de Convolução Camada Recorrente 
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data of sectors of Building N of ISEP/IPP where GECAD 
is located. The building has five energy meters that store 
the electrical energy consumption data of specific sectors 
of the building, with a time interval of 10 seconds. This 
information, as well as meteorological data, are stored in a 
SQL server automatically, through agents developed in 
Java. 


To validate the model described below, tests were 
performed using the same consumption data applied to the 
SARIMAX model and HyFIS2. The N Building 
laboratories sector was not computed as it has a large 
variation in consumption due to the experiments conducted 
there, which generate many outliers in the consumption 
history. For the experiment tests, it was performed an 
hourly average of the consumption stored every ten 
seconds, due to the need of predicting the next hour of 
consumption. 


3.1 The Long and Short Time series Network Adapted 
(LSTNetA) Model 


The model developed for energy consumption prediction 
was based on the model proposed by Lai [4], represented 
in Fig. 4, which consists of a hybrid ANN with three 
distinct layers, initially has a convolutional layer for the 
extraction of short-term patterns of the time series, has as 
input the time series, the output of this layer is the input of 
the recurrent layer that memorizes historical information 
of the time series, which in turn its output is the input of 
the highly connected dense layer. Finally, the output of the 
highly connected layer is combined with the output of the 
autoregressive linear regression (ARMA) [26] ensuring 
that the output will have the same scale as the input, thus 
composing the prediction. 
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Fig. 5. Architecture of the LSTNetA model. 


Source: adapted from Lai [4]. 


Fig. 6 summarizes the implementation of the 
LSTNetA network. The convolution layer is represented 
by the Conv2D class, the recurrent layer is represented by 
the GRU classes, the dense layer is represented by the 
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Dense classes, the auto-regression is represented in the 
PostARTrans class. 


It is important to note that the recurrent layer uses 
one of the RNN variants the GRU (Gated Recurrent Unit) 
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[1], this ANN model as well as the LSTM (Long Short- 
Term Memory) aims to solve the problem of short-term 
memory of RNNs that, in long series, have difficulty 
transporting the results of previous steps to the later ones. 


not indented at the same Level as the co 


~ Connected to 


Fig. 6: Summary of the LSTNet implementation. 
Source: The Author. 


In the backpropagation stage, the learning process 
of ANNs, the RNNs suffer from the problem of gradient 
dissipation (The Vanishing Gradient Problem). Gradients 
are values used to update the weights of neural networks. 
The vanishing gradient problem is when the weights 
propagated during network training are multiplied by 
values smaller than 1 for each network layer passed 
through, arriving at the initial network layers with tiny 
values. This causes the adjustment of weights, calculated 
at each iteration of net training, to be too small, and makes 
net training more expensive. 


Thus, in RNNs the layers that receive a small 
gradient update stop learning, with this the RNNs can 
forget what was seen in longer sequences, thus having a 
short-term memory. 


Fig. 7 shows a typical architecture of a GRU. 
Basically what makes it different from a standard RNN are 
the reset gate and update gate, which by applying the 
Sigmoid and tanh activation functions, it is defined 
whether the previous output /;.; will be considered or 
discarded for the calculation of the new output. 
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GRU Arquitetura 


Fig. 7. Typical architecture of a GRU. 
Source: The Author 


The LSTNetA model was developed in the 
Python programming language version 3.7 [17] using the 
machine learning library, developed by Google, 
TensorFlow version 2.0 [22]. 


IV. RELATED WORKS 


Fig. 8, represents the power consumption time series used 
by the SARIMAX model to train and test the LSTNetA 
model and HyFIS2. The top graph represents the historical 
series of consumption in watts, which starts at zero hours 
on 08/04/2019 to eight hours on 20/12/2019. The middle 
graph shows the calculated trend of the series and the 
bottom graph its seasonality. 
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Fig. 8. Historical series of consumption. 


Source: The Author. 


4.1 SARIMAX 


As seen previously, the SARIMAX method is a statistical 
method of time series analysis, enabling the prediction 
through linear regressions. Thus, it cannot be characterized 
as a machine learning algorithm. In the scope of this work, 
it was applied to obtain prediction data of a widely used 
model, obtaining results for comparison with the proposed 
model and with the HyFIS2 model. 


To verify the accuracy of all models covered in 
this work, the last 120 records corresponding to five days 
of consumption were used for comparison between real 
and predicted consumption, shown in Fig. 9. To calculate 
the error used to verify the results of this work, in all 
models, the root mean square error (RMSE - described in 
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chapter 01) was used, shown in Fig. 10. The application of 
this model resulted in an average RMSE of 604.72 that 
was considered as accuracy of this model, in this work. 


SARIMAX 


mr SARIMAX 


Hours 


Fig. 9. Comparison Real Consumption X Sarimax. 


Source: The Author. 
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Fig. 10. Verified errors of the SARIMAX method. 


Source: The author. 


4.2 Model HyFIS2 


The HyFIS2 (Hybrid neural Fuzzy Inference System) 
model uses a hybrid approach with the combination of 
dense ANN and fuzzy logic. The system includes five 
layers, as shown in Fig. 11. In the first layer, the nodes are 
the inputs that transmit signals to the next layer. In the 
second and fourth layers, the nodes act as membership 
functions to express the input-output fuzzy linguistic 
variables. In these layers, the fuzzy sets defined for the 
input-output variables are represented as: large (L), 
medium (M) and small (S). However, for some 
applications, these can be more specific and represented 
as, for example, large positive (LP), small positive (SP), 
zero (ZE), small negative (SN) and large negative (LN). In 
the third layer, each node is a rule node and represents a 
fuzzy rule. The connection weights between the third and 
the fourth layer represent certainty factors of the associated 
rules, i.e., each rule is activated and controlled by the 
weight values. Finally, the fifth layer contains the node 
that represents the output of the system. 
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Fig. 11. Neuro-Fuzzy structure of the HyFIS2 model. 
Source: Jozi [9] 


For prediction of electricity consumption, as in all 
models tested, the last 120 historical records were used, 
corresponding to five days of consumption. The 
comparison between real and predicted consumption is 
shown in Fig. 12. Fig. 13 shows the RMSE errors 
calculated. The application of this model resulted in an 
average RMSE of 602.71 which was considered the 
accuracy of this model, in this work. 


HyFIS2 
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Fig. 12. Real Consumption Comparison X HyFis2. 


Source: The Author. 
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Fig. 13. Verified errors of the HyFIS2 model. 


Source: The Author. 


V. APPLICATION OF THE LSTNETA MODEL 


The training of the LSTNetA ANN was 
performed as previously described, using the data of the 
real electricity consumption of the N building of the 
ISEP/IPP where GECAD is located, except for the 
laboratory sector. The historical series analyzed was from 
zero hours on 08/04/2019 to eight hours on 20/12/2019, 
with measurements every ten seconds, totaled every hour, 
resulting in 4186 records, containing time and 
consumption. The training was performed with a learning 
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rate of 0.0003, using the Adam [10] stochastic method of 
gradient descent optimization for updating the weights in 
the backpropagation process. For the initial weights of the 
ANN, the algorithm VarianceScaling [3] was used, which 
generates initial weights with values on the same scale as 
the inputs. The convolution kernel used was a 6x6 identity 
matrix and a training loop with 1000 epochs was 
performed. All these parameters were obtained 
experimentally and the ones with the best results were 
selected. 


LSTNetA 


00 eT ae 
Actual Consumption 


0 
SSRSSRRRARYSS A ——wistneta 


Fig. 14. Comparison Real Consumption X LSTNetA. 
Source: The Author. 


For the prediction of electricity consumption, as 
in all models tested, the last 120 historical records were 
used, corresponding to five days of consumption. The 
comparison between real and predicted consumption is 
shown in Fig. 14. Fig. 15 shows the RMSE errors 
calculated. The application of this model resulted in an 
average RMSE of 198.44 which was considered the 
accuracy of this model, in this work. 


6000 
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Fig. 15. Verified errors of the LSTNetA model. 
Source: The Author. 


VI. RESULTS AND CONCLUSION 


Table 1 shows a fragment of the results of the 
three models, the Date and Time column, the Actual 
column showing the actual electricity consumption in 
watts at that date and time, the LSTNetA column the 
prediction of this model at that date and time, the Error - 
LSTNetA column the absolute error of this model in the 
prediction, the column HyFIS2 the prediction of this model 
at date and time, the column Error - HyFIS2 the absolute 
error of this model in the prediction, finally the columns 
SARIMAX and Error - SARIMAX, representing the 
prediction and absolute error, respectively, in the 
SARIMAX model. 


Comparing the results of the SARIMAX, HyFIS2 
and LSTNetA models, it can be observed, as shown in Fig. 
16, that the LSTNetA method, with the data used for 
testing, was the one that presented the closest predictions 
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of the real consumption of electricity, where the red line, 
which represents the predictions of the LSTNetA model, in 
most of the period overlapped the blue line that represents 
the real consumption. This demonstrates a prediction very 
close to the real consumption value, with low errors. 


Table 1. Fragment of Predictions and Errors of the 3 
models 


Date and | Actual LSTNetA | Error - HyFis2 Error - SARI | Error - 
Time LSTNetA HyFis2 MAX 


Consum SARIMAX 
ption 


19/12/201 | 4759,38 | 4824,27 | 64,8900] 3427,13 | 1332,2500 | 4721,76 37,6190 
9 09:00 


19/12/201 | 6781,51 6685,28 | 96,2346 | 6583,38 198,1300 | 5516,26 | 1265,2476 
9 10:00 


19/12/201 7279,1 | 7194,26 | 84,8373 | 5798,56 | 1480,5400 | 6124,20 | 1154,8976 
9 11:00 


19/12/201 | 6332,88 | 6247,08 | 85,8038 | 5798,38 534,5000 | 5497,10 835,7849 
9 12:00 


19/12/201 | 5350,34 | 5569,95 | 219,6063 | 6322,98 | 972,6400 | 5653,27 302,9276 
9 13:00 


19/12/201 | 6677,56 | 6499,50 | 178,0639 | 5798,37 879,1900 | 5197,56 | 1479,9983 
9 14:00 


Real X Predicted Consumption 


10000 
8000 
6000 = Actual Consumption 
g = LSTNetA 
3 pees = HyFis2 
2000 moms SARIMAX 


0 
1 9 17 25 33 41 49 57 65 73 81 89 97 105113 


Hours 
Fig. 16. Comparison of Real Consumption X Prediction 
Models. 


Source: The Author. 


Fig. 17 represents the errors (RSME) of the three 
models, allowing a comparison of the assertiveness of the 
predictions of each method and also concluding that the 
LSTNetA method presented a better efficiency in its 
predictions in comparison to the SARIMAX and HyFIS2 
methods. This statement can be corroborated with the data 
presented in Table 2, where the total average error of the 
LSTNetA model is significantly lower than the other 
models. 
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Fig. 17. Comparisons of errors verified in all models. 
Source: The Author. 


Table 2. RSME of the 3 Models Tested 


Error - LSTNetA Error - HyFis2 Error - SARIMAX 


RSME 198,4496 602,7109 604,5810 
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